Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix device_id bug for final_state op in multiprocess testcase #41407

Merged
merged 3 commits into from
Apr 6, 2022

Conversation

pangyoki
Copy link
Contributor

@pangyoki pangyoki commented Apr 5, 2022

PR types

Bug fixes

PR changes

Others

Describe

  • 问题
    在分布式多进程单测test_eager_dist_api.py里,如果使用最终态op,在GetDeviceContextByBackend获取设备时报错。

  • 原因
    在分布式多进程场景下,gpu1的子进程执行时,使用GetCurrentDeviceId获取到的设备是place0,但是预期获得的应该是place1,导致DeviceContextPool没法Get到相应place。

  • 解决方法
    新动态图执行kernel前,需要使用SetDeviceId事先指定place.device。

@paddle-bot-old
Copy link

paddle-bot-old bot commented Apr 5, 2022

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@pangyoki pangyoki changed the title support final_state op in multiprocess testcase fix device_id bug for final_state op in multiprocess testcase Apr 6, 2022
@pangyoki pangyoki closed this Apr 6, 2022
@pangyoki pangyoki reopened this Apr 6, 2022
@pangyoki pangyoki closed this Apr 6, 2022
@pangyoki pangyoki reopened this Apr 6, 2022
@pangyoki pangyoki merged commit b25f25d into PaddlePaddle:develop Apr 6, 2022
pangyoki added a commit to pangyoki/Paddle that referenced this pull request Apr 6, 2022
…cess testcase (PaddlePaddle#41407)

* support final_state in multiprocess

* fix no place.device

* set device_id in eager_gen
lanxianghit pushed a commit that referenced this pull request Apr 7, 2022
douch pushed a commit to douch/Paddle that referenced this pull request Apr 10, 2022
…Paddle#41407)

* support final_state in multiprocess

* fix no place.device

* set device_id in eager_gen
Thunderbrook pushed a commit that referenced this pull request Apr 22, 2022
* [cherry-pick2.3]fix compile bug of windows cuda11.5 (#41464)

cherry-pick

fix compile bug of windows cuda11.5 #41433

* fix bug of missing boost when compile cache.cc (#41449)

【chery-pick #41430】fix bug of random compile failure, due to incorrect compile order of dependencies

* Fix eager try catch (#41438) (#41477)

[Cherry-Pick]Fix eager try catch (#41438)

* Cherry-pick-PR41407, fix device_id bug for final_state op in multiprocess testcase (#41407) (#41475)

Cherry-pick PR #41407

* [BugFix] Add error hint for one_hot gpu version (#41335) (#41495)

* add one_hot gpu hint

* move allow_out_of_range judgement

* delete useless unittest

* fix bugs of reshape double grad infermeta (#41459) (#41493)

* [cherrypick-2.3] modify infer gpu memory strategy (#41427), remove cudnn_deterministic=True (#41341)  (#41491)

Co-authored-by: JingZhuangzhuang <75348594+JZZ-NOTE@users.noreply.github.com>

* [Cherry-pick][ROCm] fix dcu error in device event base, test=develop (#41523)

Cherry-pick of #41521

* [Cherry-Pick]Cherry pick PR41200, PR41474, PR41382 (#41509)

* Use `self`as a parameter of _hash_with_id function to avoid error caused by hash_id reuse (#41200)

* Add fill_constant_batch_size YAML and UT (#41474)

* Switch some dy2st UT to eager mode (#41382)

* Sitch some dy2st UT to eager mode

* Fix test_lstm and remove test_transformer

* Run test_resnet_v2 in old dy mode

* Unittest recover (#41431)

* update name

* update name

* fix test

* fix fleet bind

* update name

* update name

* fix test

* fix gpups wrapper

* remove Push/Pull/Load/Save with context in client and wrapper base class

* fix

* fix

* remove some interface

* fix

* remove

* code style

* recover

* fix

* remove code unused

* remove some unused table & accessor & CommonDenseTable => MemoryDenseTable

* fix

* fix

* fix

* recover

* remove unused code

* recover unittest

* fix

* remove

* fix

* remove code unuseful

* remove

* fix

* recover

* remove

Co-authored-by: esythan <esythan@126.com>

* add ssd sparse table

* fix

* add cache shuffle

* fix

* fix

* fix

* fix

* fix

* fix

* add unit test

* fix

Co-authored-by: Zhou Wei <1183042833@qq.com>
Co-authored-by: Sing_chan <51314274+betterpig@users.noreply.github.com>
Co-authored-by: 0x45f <23097963+0x45f@users.noreply.github.com>
Co-authored-by: pangyoki <pangyoki@126.com>
Co-authored-by: Siming Dai <908660116@qq.com>
Co-authored-by: YuanRisheng <yuanrisheng@baidu.com>
Co-authored-by: Zhang Jun <ewalker@live.cn>
Co-authored-by: JingZhuangzhuang <75348594+JZZ-NOTE@users.noreply.github.com>
Co-authored-by: Qi Li <qili93@qq.com>
Co-authored-by: esythan <esythan@126.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants